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Coronavirus genome replication is mediated by a multi-subunit protein complex that is comprised of 
more thana dozen virally encoded and several cellular proteins. Interactions of the viral replicase complex 
with cis-acting RNA elements located in the 5’ and 3’-terminal genome regions ensure the specific repli- 


Keywords: cation of viral RNA. Over the past years, boundaries and structures of cis-acting RNA elements required 

RNA virus for coronavirus genome replication have been extensively characterized in betacoronaviruses and, to 

Pan ian ie a lesser extent, other coronavirus genera. Here, we review our current understanding of coronavirus 
eplication 


cis-acting elements located in the terminal genome regions and use a combination of bioinformatic and 
RNA structure probing studies to identify and characterize putative cis-acting RNA elements in alpha- 
coronaviruses. The study suggests significant RNA structure conservation among members of the genus 
Alphacoronavirus but also across genus boundaries. Overall, the conservation pattern identified for 5’ and 
3’-terminal RNA structural elements in the genomes of alpha- and betacoronaviruses is in agreement with 
the widely used replicase polyprotein-based classification of the Coronavirinae, suggesting co-evolution 


cis-Acting element 
RNA structure 


of the coronavirus replication machinery with cognate cis-acting RNA elements. 
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1. Introduction 


Coronaviruses are enveloped, positive-strand RNA viruses with 
exceptionally large genomes of approximately 30 kb. They have 
been assigned to the subfamily Coronavirinae within the family 
Coronaviridae (de Groot et al., 2012a; Masters and Perlman, 2013). 
Together with the families Arteriviridae, Roniviridae, and Mesoniviri- 
dae, the Coronaviridae form the order Nidovirales (de Groot et al., 
2012b). The family Coronaviridae is currently comprised of four 
genera called Alpha-, Beta-, Gamma- and Deltacoronavirus. Closely 
related virus species in these genera are grouped together in spe- 
cific lineages. Coronaviruses infect mammals and birds and include 
pathogens of major medical, veterinary and economic interest 
(de Groot et al., 2012a), with severe acute respiratory syndrome 
(SARS) coronavirus (SARS-CoV) and Middle East respiratory syn- 
drome (MERS) coronavirus (MERS-CoV) providing two prominent 
examples of newly emerging, highly pathogenic coronaviruses in 
humans (Drosten et al., 2003; Ksiazek et al., 2003; Zaki et al., 2012). 

Besides their large genome sizes, coronaviruses and related 
nidoviruses stick out from other plus-strand RNA viruses by the 
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large number of virally encoded nonstructural proteins that are 
either involved in viral RNA synthesis or interact with host cell 
functions (reviewed in Masters and Perlman, 2013; Ziebuhr, 2008). 
Most of the nonstructural proteins (nsp) are expressed as sub- 
domains of two large replicase gene-encoded polyproteins called 
ppla (~450 kDa) and pp1ab (~750 kDa). Co- and posttranslational 
cleavage of ppla/lab by two types of virus-encoded proteases 
gives rise to a total of 15-16 mature proteins that (together with 
the nucleocapsid protein and cellular proteins) form the viral 
replication-transcription complex (RTC) (Almazan et al., 2004; 
Schelle et al., 2005; Ziebuhr, 2008; Ziebuhr et al., 2000). This 
multi-protein complex replicates the viral genome and produces 
an extensive set of 3’-coterminal subgenomic messenger RNAs (sg 
mRNAs), the latter representing a hallmark of corona- and other 
nidoviruses (Pasternak et al., 2006; Sawicki et al., 2007; Ziebuhr 
and Snijder, 2007). The sg mRNAs are used to express open read- 
ing frames located in the 3’-proximal third of the genome. They 
essentially encode the viral structural proteins, such as the nucleo- 
capsid (N), membrane (M), spike (S) and envelope (E) proteins, and 
a varying number of accessory proteins, the latter often involved 
in functions that counteract antiviral host responses (Masters and 
Perlman, 2013; Narayanan et al., 2008b). Coronavirus sg mRNAs 
contain a common 5’ leader sequence (approximately 60-95 nt) 
that is identical to the 5’ end of the genome. The complement of 
this sequence is attached to the 3’ end of nascent negative strands 
in a complex process called “discontinuous extension of negative 
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strands” (Sawicki and Sawicki, 1995, 1998; Sawicki et al., 2007; 
Sethna et al., 1991). This process involves attenuation of negative- 
strand RNA synthesis at transcription-regulating sequences (TRS) 
located upstream of the individual ORFs in the 3’-proximal genome 
region (“body TRSs”, TRS-B). Guided by basepairing interactions 
between the negative-strand complement of a TRS-B and the 
TRS located downstream of the 5’ leader on the viral genome 
(“leader TRS”, TRS-L), the nascent minus strand may be transferred 
from its downstream position on the template (at the TRS-B) to 
the TRS-L, where negative-strand RNA synthesis is then resumed 
and completed by copying the 5’ leader sequence. The set of 3’ 
antileader-containing sg minus-strand RNAs is subsequently used 
as templates for the production of the characteristic nested set of 
5’ capped, 5’ leader-containing and 3’-polyadenylated sg mRNAs 
in coronavirus-infected cells (Lai et al., 1983; Sawicki et al., 2001; 
Sawicki and Sawicki, 1995; Sethna et al., 1989; Spaan et al., 1983). 
Sg minus-strand RNAs contain a U-stretch at their 5’ end, thus pro- 
viding a template for 3’ polyadenylation of sg mRNAs (Hofmann 
and Brian, 1991; Wu et al., 2013). 

Similar to other RNA viruses, coronavirus genomes contain 
important RNA signals in their 5’ and 3’-terminal genome regions, 
mainly (but not exclusively) in the untranslated regions (UTR). 
These signals are required for viral RNA synthesis (replication 
and/or transcription) and are collectively referred to as cis-acting 
RNA elements (Chang et al., 1994; Dalton et al., 2001; Izeta et al., 
1999; Kim et al., 1993; Liao and Lai, 1994; Lin et al., 1994, 1996; 
Zhang et al., 1994). As indicated above, coronaviruses also contain 
cis-acting elements at internal positions in the genome, the best 
documented ones being the leader and body TRSs which play a vital 
role in the transfer of the nascent minus-strand RNA to an upstream 
position on the template (see above). Other internal cis-acting ele- 
ments include specific RNA signals required for genome packing, 
which have been characterized in a small number of coronaviruses 
(Chen et al., 2007; Escors et al., 2003; Makino et al., 1990; Morales 
et al., 2013; Penzes et al., 1994), and a complex RNA pseudoknot 
structure located in the ORF1a-ORF1b overlap region that mediates 
a(—1) ribosomal frameshift event and thus controls the expression 
of the second large ORF on the coronavirus genome RNA (ORF1b) 
(Brierley et al., 1987, 1989; de Haan et al., 2002; Namy et al., 2006). 

Both viral and cellular proteins, including the viral N pro- 
tein, heterogeneous ribonucleoprotein (hnRNP) family members, 
polypyrimidine tract-binding protein, and poly(A)-binding protein 
(PABP), have been shown to bind to specific coronavirus cis-acting 
elements and there is evidence to support the biological signifi- 
cance of some of these protein-RNA interactions (reviewed in Shi 
and Lai, 2005; Sola et al., 2011b). 

The coronavirus RTC is a multi-subunit assembly comprised 
of more than a dozen viral and an unknown number of cellu- 
lar proteins. The complex is anchored through transmembrane 
domains present in nsps 3, 4 and 6 to intracellular membranous 
structures that provide a specialized membrane-shielded compart- 
ment in (or at) which viral RNA synthesis takes place (den Boon 
and Ahlquist, 2010; Gosert et al., 2002; Kanjanahaluethai et al., 
2007; Knoops et al., 2008; Oostra et al., 2008; Oostra et al., 2007; 
Snijder et al., 2006; van Hemert et al., 2008). Over the past years, 
viral components of the coronavirus RTC have been character- 
ized in considerable detail, providing a wealth of functional and 
structural information (reviewed in Imbert et al., 2010; Masters, 
2006; Ulferts et al., 2010; Ulferts and Ziebuhr, 2011; Ziebuhr, 
2008). A large number of virally encoded enzymes, including 
protease, ADP-ribose-1”-phosphatase, NTPase, 5’-to-3’ helicase, 
RNA 5’-triphosphatase, RNA polymerase, guanosine-N7 and ribose 
2’-O methyltransferases, 3’-to-5’ exoribonuclease and uridylate- 
specific endoribonuclease, have been identified (Baker et al., 1989; 
Bhardwaj et al., 2004; Chen et al., 2009, 2011; Decroly et al., 2008, 
2011; Eckerle et al., 2010; Ivanov et al., 2004; Kanjanahaluethai and 


Baker, 2000; Minskaia et al., 2006; Putics et al., 2005; Saikatendu 
et al., 2005; Seybert et al., 2000; te Velthuis et al., 2010; Ziebuhr 
et al., 1995). In some cases, these activities could be linked to spe- 
cific steps of viral RNA synthesis and/or RNA processing or were 
shown to interfere with cellular functions (reviewed in Masters 
and Perlman, 2013; Ziebuhr, 2008). There are multiple interactions 
between the individual replicase gene-encoded nsps and the struc- 
tural basis and functional implications of these interactions have 
been studied in a few cases. For example, it has been shown that the 
exoribonuclease and ribose 2’ O-methyltransferase activities asso- 
ciated with nsp14 and nsp16, respectively, are each stimulated by 
specific interactions with nsp10 (Bouvet et al., 2014; Decroly et al., 
2011). Also, there is evidence that a heteromultimeric complex 
formed by nsp7 and nsp8 interacts with (and serves as a processiv- 
ity factor for) the RNA-dependent RNA polymerase (RdRp, nsp12) 
(Zhai et al., 2005). Additional interactions between individual sub- 
units of the RTC have been suggested on the basis of two-hybrid 
screening data (Pan et al., 2008; von Brunn et al., 2007) and there 
is evidence that a substantial number of coronavirus nsps form 
homo- and/or heterooligomeric complexes (Anand et al., 2002, 
2003; Bouvet et al., 2014; Chen et al., 2011; Ricagno et al., 2006; 
Su et al., 2006; Xiao et al., 2012; Zhai et al., 2005). 

Despite major progress in the characterization of proteins 
and cis-acting RNA elements involved in coronavirus RNA syn- 
thesis, the molecular mechanisms that mediate specific steps of 
coronavirus RNA replication and transcription are far from being 
understood. Important information on cis-acting RNA elements has 
been obtained from studies using defective interfering (DI) RNAs 
and, more recently, genetically engineered coronavirus mutants 
(reviewed in Brian and Baric, 2005; Masters, 2007; Sola et al., 
2011b). 

In this article, we will summarize previous work on 
(beta)coronavirus genomic cis-acting RNA elements and will then 
move on to present conclusions from our recent bioinformatic and 
RNA structure probing studies on cis-acting RNA elements con- 
served in alphacoronaviruses. 


2. Identification and delineation of coronavirus cis-acting 
elements 


Historically, cis-acting RNA elements required for coronavi- 
rus RNA synthesis have mainly been studied by using naturally 
occurring and genetically engineered defective interfering RNAs 
(DI RNAs) (reviewed in Brian and Baric, 2005; Brian and Spaan, 
1997; Masters, 2007; Sola et al., 2011b). DI RNAs are replication- 
competent genome-derived RNA molecules that contain extensive 
internal deletions but retained all the cis-acting RNA signals 
required for replication by functional replicase complexes provided 
by a helper virus that replicates in the same cell (Levis et al., 1986; 
Weiss et al., 1983). Essentially, these cis-acting elements comprise 
the untranslated 5’- and 3’-terminal genome regions but, in some 
cases, may also extend into coding regions. In some cases, they also 
contain noncontiguous sequences derived from internal genome 
regions. Coronavirus DI RNAs were first described for the beta- 
coronaviruses MHV and BCoV (Chang et al., 1994; de Groot et al., 
1992; Hofmann et al., 1990; Luytjes et al., 1996; Makino et al., 1985, 
1988a, 1988b, 1984). Subsequently, these studies were extended to 
alpha- and gammacoronaviruses (Izeta et al., 1999; Mendez et al., 
1996; Penzes et al., 1994, 1996). Over the years, studies of defec- 
tive genomes proved to be very useful for identifying coronavirus 
RNA elements required for replication (and packaging). However, 
DI RNAs also have disadvantages, a major one being homologous 
recombination between the RNA replicon and the helper virus 
genome. Thus, for example, artificial DI RNAs containing mutant 5’ 
leader sequences rapidly acquire the leader sequence of the helper 
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virus, a process commonly referred to as “leader switching” (Chang 
et al., 1994, 1996; Makino and Lai, 1990). The latter occurs very 
often if poorly replicating mutant DI RNAs are to be character- 
ized which generally require amplification steps by serial passaging 
to determine their phenotype. With the development of reverse- 
genetic systems suitable to produce and manipulate full-length 
coronavirus cDNA copies, an attractive alternative for studying cis- 
acting RNA elements at the genome level (including long range 
RNA-RNA interactions) is now available that overcomes some of the 
limitations of DI RNA-based systems (Almazan et al., 2000; Casais 
et al., 2001; Scobey et al., 2013; Tekes et al., 2008; Thiel et al., 2001; 
van den Worm et al., 2012; Yount et al., 2000, 2003). 


2.1, Delineation of 5’ cis-acting elements 


DI RNAs studies on representative betacoronaviruses revealed 
that less then 500 nt from the 5’ end of the genome (466 nt in 
MHV and 498 nt in BCoV) are required for replication (Chang et al., 
1994; Kim et al., 1993; Luytjes et al., 1996). In subsequent studies 
on alpha- and gammacoronaviruses, minimal 5’ cis-acting signals 
of 649 and 544 nt were determined for TGEV (Escors et al., 2003) 
and IBV, respectively (Dalton et al., 2001). The 5’-terminal genome 
regions of 466 nt (MHV) to 649 nt (TGEV) comprise the entire 5’ 
UTR ranging in size from 210 nt (MHV, BCoV and HCoV-OC43) to 
314 nt (TGEV) and thus extend into the nsp1-coding region of ORF 
1a (see below). By contrast, the gammacoronavirus IBV features 
a much larger 5’ UTR (528 nt) (Boursnell et al., 1987) and lacks 
a counterpart of nsp1 (Ziebuhr et al., 2001). It therefore appears 
that the gammacoronavirus 5’ UTR (alone) contains all the 5’ RNA 
signals required for genome replication. 


2.2. Functional and structural features of coronavirus 5’ 
cis-acting elements 


Cis-acting RNA structures in the 5/-terminal region of the 
coronavirus genome have first been studied for BCoV using DI RNA- 
based systems (Brown et al., 2007; Chang et al., 1994, 1996; Gustin 
et al., 2009; Raman et al., 2003; Raman and Brian, 2005). In the 5’- 
terminal 215 nts of the BCoV genome, four stem-loops (designated 
I [comprised of Ia and Ib], Il, Ill, and IV) were defined. Enzymatic 
probing and mutational analysis of both naturally occurring and 
genetically engineered DI RNAs were used to (i) corroborate the 
predicted RNA secondary structures and (ii) determine their func- 
tional significance in DI RNA replication. More recently, two further 
stem-loops (called SL-V and SL-VI) were identified in the BCoV 
nsp1-coding region of which SL-VI was confirmed to be essential 
for DI RNA replication (Brown et al., 2007). 

Subsequent studies suggested (a varying degree of) conserva- 
tion of 5’ cis-acting elements among betacoronaviruses and even 
the more distantly related alpha- and gammacoronaviruses (Chen 
and Olsthoorn, 2010; Kang et al., 2006). To facilitate the discussion 
of data obtained in studies of different viruses by different labora- 
tories, we will use in this article a uniform nomenclature for the 
main RNA structural elements in the 5’-proximal genome region 
(SL1, SL2, [SL3 if present], SL4 and SL5). The nomenclature is based 
on SL designations used by the Leibowitz and Giedroc laborato- 
ries and in predictions of genus- and lineage-specific conservation 
patterns of 5’ cis-acting elements in alpha-, beta- and gammacoro- 
naviruses (Chen and Olsthoorn, 2010; Kang et al., 2006; Liu et al., 
2006, 2007). The proposed functional and structural conservation 
of 5’ cis-acting elements among betacoronaviruses received strong 
support by reverse-genetic data demonstrating that SARS-CoV SL1, 
SL2, and SL4 can functionally replace their counterparts in the MHV 
genome when introduced individually (Kang et al., 2006). By con- 
trast, replacement of the entire MHV 5’ UTR with that of SARS-CoV 
did not produce a viable MHV mutant, suggesting a requirement 


for additional stable or transient long-range RNA-RNA interactions 
of the 5’ UTR with other genome regions. Evidence to support this 
hypothesis was obtained in subsequent studies. For example, the 
energetically unstable lower part of MHV SL1 was found to be 
involved in long-range RNA interactions with the 3’ UTR (Li et al., 
2008) (see below). Also, in a study using MHV/BCoV chimera, a 
region downstream of SL4 was revealed to be engaged in long-range 
interactions with the nsp1-coding region, thus possibly forming an 
extensive higher-order RNA structure (Guan et al., 2012). A sub- 
sequent BCoV DI RNA mutagenesis study (Su et al., 2014) further 
suggested that this multipartite RNA structure may involve several 
stem-loop (sub)structures identified in earlier studies (Gustin et al., 
2009; Raman and Brian, 2005) but require refolding of other RNA 
structures suggested earlier to be essential for DI RNA replication 
(Brown et al., 2007). The study by Su et al. (2014) also identified an 
intriguing requirement in cis of an oligopeptide sequence in the N- 
proximal part of nsp1, suggesting that nsp1 may be an essential 
cis-acting protein factor in betacoronavirus replication, in addi- 
tion to its multiple other functions (Brockway and Denison, 2005; 
Huang et al., 2011a,b; Kamitani et al., 2006, 2009; Lei et al., 2013; 
Lokugamage et al., 2012; Narayanan et al., 2008a; Tanaka et al., 
2012; Tohya et al., 2009; Wathelet et al., 2007; Ziist et al., 2007). 
Possible interacting partners for nsp1 remain to be identified. 

The 5/-proximal SL1 and SL2 are predicted to be conserved 
across all genera of the Coronavirinae (Chen and Olsthoorn, 2010; 
Liu et al., 2007). Nuclear magnetic resonance (NMR) spectroscopy 
studies of MHV and HCoV-OC43 SL1 RNAs supported the predicted 
stem-loop and revealed 2-3 noncanonical base pairs in the mid- 
dle of the stem. The fully base-paired SL1 was proposed to exist 
in equilibrium with higher-energy (partially unfolded) conformers. 
Characterization of MHV mutants containing specific replacements 
in SL1 and sequence analysis of second-site revertants supports a 
“dynamic SL1” model in which SL1 is structurally and functionally 
bipartite (Li et al., 2008). While the upper part of SL1 is (required 
to be) stable, the lower part is (required to be) unstable, possibly 
indicating the requirement for an optimized stability to permit or 
fine-tune transient long-range (RNA- or protein-mediated) interac- 
tions between the 5’ and 3’ UTRs required for sgRNA transcription 
and genome replication, respectively. 

SL2 is the most conserved structure in the coronavirus 5’ UTR 
(Chen and Olsthoorn, 2010; Liu et al., 2007). It is comprised of 
a 5-bp stem and a highly conserved loop sequence, 5’/-CUUGY- 
3’, that was shown to adopt a 5’-uYNMG(U)a or uCUYG(U)a-like 
tetraloop structure (Lee et al., 2011; Liu et al., 2009). Reverse genet- 
ics data confirmed that SL2 is required for MHV replication and may 
have a specific role in sgRNA synthesis. Within certain structural 
constraints, nucleotide replacements were found to be tolerated 
or could be rescued by increasing the stem stability, suggesting 
a limited plasticity of this important cis-acting RNA element (Liu 
et al., 2009). 

The predicted SL3 (named SL-IJ in BCoV DI RNA studies) appears 
to be conserved only in a subset of beta- and gammacoronaviruses 
(Chen and Olsthoorn, 2010). For BCoV and closely related viruses, 
the TRS-L core sequence (CS) has been proposed to be exposed 
in the SL3 loop region, a structure similar to the TRS-L hairpin 
structure reported for the related arterivirus equine arteritis virus 
(EAV) (Chang et al., 1996; van den Born et al., 2004, 2005). By 
contrast, in most other coronaviruses, the TRS-L CS and flanking 
regions were suggested to be located in nonstructured regions 
(Chen and Olsthoorn, 2010; Stirrups et al., 2000; Wang and Zhang, 
2000). 

SL4 is a long hairpin located downstream of the TRS-L CS and 
suggested to be conserved in all coronavirus genera (Chen and 
Olsthoorn, 2010; Raman et al., 2003; Raman and Brian, 2005). 
In the vast majority of coronaviruses, SL4 contains a short ORF 
comprised of only a few codons. Because of its position in the 
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Fig. 1. Alignment-based secondary structure prediction of 5’ genome regions of betacoronaviruses. The viruses included in this analysis represent all currently recognized 
lineages and species in the genus Betacoronavirus. The alignment was generated using LocARNA and the structure was calculated using RNAalifold. The consensus sequence 
is represented using the IUPAC code: A (adenine), C (cytosine), G (guanine), U (uracil), R (purine [A or G]), Y (pyrimidine [C or U]), M (C or A), K (U or G), W (U or A), S (Cor G), 
B(C,U, or G [not A]), D (A, U, or G [not C]), H (A, U, or C [not G]), V (A, C, or G [not U], N (any base). Colors are used to indicate conserved base pairs: from red (conservation of 
only one base-pair type) to purple (all six base-pair types are found); from dark (all sequences contain this base pair) to light colors (1 or 2 sequences are unable to form this 
base pair). The gray bars below the alignment indicate the extent of sequence conservation. Gray shadows are used to link RNA structures with the corresponding dot-bracket 
notations above the alignment. To refine the alignment, an anchor at the highly conserved apical loop of SL2 was used. 


genome upstream of ORF1a it is generally referred to as the uORF. 
Recent reverse genetics work in the MHV system (Wu et al., 2014; 
Yang et al., 2011) showed that disruption of the uORF yields 
viable mutants that, however, acquire alternate uORFs upon serial 
passaging in cell culture. In vitro, uUORF-disrupted RNAs showed 
enhanced translation of the downstream ORF. The available data 
suggest that the uORF represses ORF1a/1b translation and has a 
beneficial but non-essential role in coronavirus replication in cell 
culture. SL4 may be further subdivided into SL4a and SL4b. Inter- 
estingly, despite its conservation in coronaviruses, SL4 tolerates 
extensive mutations. Thus, for MHV, it was shown that base pair- 
ing in SL4a is not required for replication and separate deletions 
of SL4a and SL4b are tolerated. By contrast, complete deletion of 
SL4 and a 3-nt deletion immediately downstream of SL4 abolished 
or profoundly impaired viral RNA synthesis. The characterization 
of second-site mutations and a viable MHV mutant in which SL4 
was replaced with a shorter sequence-unrelated stem-loop led toa 
model in which SL4 was proposed to function as a spacer element 
that controls the orientation of SL1, SL2, and TRS-L and thereby 
directs subgenomic RNA synthesis (Yang et al., 2011). The SL4 
sequence overlaps with the ‘hotspot’ of the 5’-proximal genomic 
acceptor required for BCoV discontinuous transcription (Wu et al., 
2006), thus further supporting a role of the region immediately 
downstream of TRS-L in subgenomic RNA synthesis. 

Chen & Osthoorn (2010) predicted that variations of another 
RNA structure called SL5 may be conserved in specific coron- 
aviruses genera and/or lineages. In alpha- and betacoronaviruses, 
SL5 extends into ORFla. Depending on the lineage studied, con- 
served loop sequences could be identified in substructural hairpins 
of SL5. Sequence conservation was found to be more pronounced 
in alpha- compared to betacoronaviruses. Thus, for example, in 
alphacoronaviruses, three hairpins, called SL5a, 5b, and 5c, each 
containing a 5’-UUCCGU-3’ loop sequence, were identified. Simi- 
lar structures were only partly conserved in betacoronaviruses and 
significant lineage-specific variations in the substructural hairpins 
and their loop sequences were identified. A possible SL5 equivalent 
in gammacoronaviruses was predicted to adopt a rod-like struc- 
ture that lacks conserved loop sequences (Chen and Olsthoorn, 
2010). As outlined above, possible betacoronavirus SL5 substruc- 
tures located within (or extending into) the nsp1-coding region 
(termed SLs IV, V, VI, and VII) have been characterized structurally 


and functionally using BCoV DI RNA and MHV reverse genetics sys- 
tems (Brown et al., 2007; Guan et al., 2011, 2012; Raman and Brian, 
2005). 


2.3. Characterization of alphacoronavirus 5’-proximal RNA 
structures 


To a large extent, previous work on coronavirus cis-acting ele- 
ments has focused on only two species of closely related lineage A 
betacoronaviruses (represented by BCoV and MHV), while there is 
limited information on functionally and structurally related ele- 
ments in other coronaviruses. We therefore decided to embark 
on a more detailed characterization of putative alphacoronavirus 
cis-acting elements located in the 5’ and 3’ genome regions. As 
a starting point, we used 20 coronavirus genomes representing 
all the currently approved species from the four genera of the 
Coronavirinae and the newly identified MERS-CoV (de Groot et al., 
2013) and calculated alignment-based secondary structures using 
LocARNA (v.1.7.2) (Will et al., 2012). Although LocARNA considers 
both sequence and secondary structure to calculate these mul- 
tiple alignments, we failed to detect RNA secondary structures 
conserved across all coronavirus genera when using these highly 
diverse sequences. We therefore resorted to producing separate 
genus-wide alignment-based secondary structure predictions for 
coronaviruses. To do this, we used complete 5’ UTR regions and 
20 nts from the ORF1a 5’ end to calculate consensus structures 
with RNAalifold -r -p --color --noLP --MEA (v.2.0.7) (Lorenz et al., 
2011) and -C, respectively, if constraints were used. The subgroups 
(lineages) obtained in these sequence alignments were consistent 
with the previously recognized subgroups of alpha- and betacoro- 
naviruses (de Groot et al., 2012a, 2013) (not shown). As shown 
in Fig. 1, all the functionally relevant RNA secondary structure 
elements identified previously in the betacoronavirus 5’ genome 
regions, SL1, SL2 and SL4, could be identified. In line with previous 
predictions and despite the pronounced sequence diversity in the 
5’-terminal genome regions, these RNA structures are suggested to 
be conserved among all currently approved betacoronavirus lin- 
eages and species. This conclusion is also supported by the large 
number of covariations, suggesting a strong selection pressure to 
retain these base-pairing interactions. Fig. 1 also shows the lack 
of conservation of a stable hairpin structure containing the TRS-L. 
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It should be noted that, for several betacoronaviruses, it is possi- 
ble to force a stem-loop containing the TRS-L, but this stem-loop 
is only supported by two conserved base pairs. In line with pre- 
vious reports, bovine coronavirus (and other viruses belonging to 
the species Betacoronavirus 1) appear(s) to be an exception in that 
a more stable SL3 containing the 5’-UCUAAAC-3’ sequence in the 
loop region can be predicted in this case. Overall, the structure pre- 
diction shown in Fig. 1 turned out to be in perfect agreement with 
previous studies (see chapter 2.2). 

We therefore used this approach in subsequent studies of con- 
served RNA structure elements located in the 5’ genome regions of 
alphacoronaviruses. Predictions were verified and refined by RNA 
structure probing analyses (Ehresmann et al., 1987; Quet al., 1983) 
using in vitro-transcribed RNAs with sequences corresponding to 
the 5’-terminal genome regions of HCoV-229E and HCoV-NL63, 
respectively (to be published elsewhere). For the calculation of sec- 
ondary structures of single sequences, we used RNAfold --noLP 
(v.2.0.7) (Lorenz et al., 2011) and -C, respectively, if constraints 
were used. 

Using multiple alignments calculated with LocARNA, we were 
able to identify RNA secondary structures conserved among (all) 
alpha- and betacoronaviruses, thus confirming and extending ear- 
lier studies (Figs. 1 and 2) (Chen and Olsthoorn, 2010; Raman et al., 
2003). These include (i) SL1, (ii) SL2 (with its short stem and highly 
conserved loop region [5’-UUUGU-3’ in alphacoronaviruses]), (iii) a 
poorly structured region containing the TRS-L CS and some flanking 
sequence, (iv) SL4 (containing the uORF) and (v) SL5, the latter being 
conserved very well among all alphacoronavirus species (Fig. 2) 
and significantly more diverse in betacoronaviruses (Chen and 
Olsthoorn, 2010). The large number of covariant base pairs in the 
alphacoronavirus SL5 (Fig. 2) suggest significant constraints and a 
major functional role for this structure in alphacoronavirus (and, 
possibly, betacoronavirus) replication and there is indeed some 
experimental evidence to support this hypothesis (Brown et al., 
2007; Su et al., 2014). Using RNA structure probing information 
obtained for the 5’-terminal 600 nts of HCoV-229E and HCoV-NL63, 
we confirmed and refined our RNA secondary structure predic- 
tions for two B-lineage alphacoronaviruses (Figs. 3 and 4). The data 
obtained in these studies (details to be published elsewhere) sup- 
port a model in which the ~310-nt 5’ genome regions consistently 
fold into 4 major RNA structures called SL1, SL2, SL4, and SL5. The 
latter contains 3 hairpin substructures, SL5a, 5b, and 5c, featuring 
highly conserved 5(6)-nt loop sequences. 

The consensus secondary structure predicted for alphacoron- 
aviruses (Fig. 2) was found to fit very well the individual structure 
predictions for HCoV-229E and HCoV-NL63 (Fig. 3A and 4A) and 
the inclusion of structure probing information as additional con- 
straints required only very few minor adjustments in our structure 
predictions (Figs. 3B and 4B). Most importantly, the basal part of the 
predicted SL4 was now predicted to be unpaired, thereby extending 
the single-stranded region downstream of the TRS-L and also affect- 
ing the spacing between SL4 and SL5. The (predicted) basal stem of 
SL4 contains the most conserved sequence within the alphacoron- 
avirus 5’-terminal RNA structural elements (Fig. 2, see the red base 
pairs). It is therefore reasonable to think that this structurally flex- 
ible region is involved in long-range RNA-RNA interactions. In line 
with this idea, a previous TGEV reverse genetic study showed that 
mutants permitting additional base-pairing interactions of the copy 
TRS-B upstream of a reporter sgRNA with the 5’-GAAA-3’ sequence 
immediately downstream of the TGEV TRS-L CS (5’-ACUAAAC-3’) 
(see also Fig. 2) enhance the production of this specific reporter 
sgRNA (Zujfiiga et al., 2004). These functional data and our struc- 
tural analyses of alphacoronavirus 5’-terminal genome regions lead 
us to suggest that the basal part of SL4 exists in a flexible state, 
thereby possibly facilitating strand transfer during sg minus-strand 
RNA synthesis. Both this RNA structural flexibility and the role of 


proteins that bind in this region and thereby likely affect the SL4 
structure remain to be further investigated. Of note, hnRNP family 
members along with the viral N protein have been shown to bind 
in this region and the N protein has been suggested to have chap- 
erone and TRS-L/TRS-B unwinding activities (Galan et al., 2009; 
Grossoehme et al., 2009; Huang and Lai, 1999; Keane et al., 2012; 
Li et al., 1997, 1999; Shi and Lai, 2005; Sola et al., 2011a,b; Zifiga 
et al., 2007, 2010). Itis therefore tempting to speculate that cellular 
and/or viral proteins bind and unwind the energetically labile SL4 
substructure to facilitate the strand transfer during sg minus-strand 
RNA synthesis. 


2.4, Delineation of 3’ cis-acting elements required for coronavirus 
replication 


Initial information on 3’ cis-acting elements required for RNA 
replication was (again) derived from betacoronavirus DI RNA stud- 
ies (Kim et al., 1993; Lin and Lai, 1993; Luytjes et al., 1996). Deletion 
mutagenesis data suggested that 3’ cis-acting elements encompass 
the entire 301-nt 3’ UTR plus poly(A) tail, along with a portion of 
the nucleocapsid (N) protein gene. However, subsequent studies 
showed that the structural protein genes (including the N protein 
coding region) tolerate major changes including combinations of 
single-site mutations and rearrangements of entire genes, suggest- 
ing that the 3’-proximal coding regions do not form part of the 
3’ cis-acting element (de Haan et al., 2002; Goebel et al., 2004b; 
Lorenz et al., 2011). Similarly, for members of the species Alpha- 
coronavirus 1 (TGEV, FCoV), it was shown that the N gene was not 
required for replication (Izeta et al., 1999) and even deletions of the 
accessory protein genes 7a and 7b were tolerated in FCoV, suggest- 
ing that the 3’ cis-acting replication signals do not exceed 283 nts 
plus poly(A) tail (Haijema et al., 2004). For the gammacoronavirus 
IBV, a minimal 3’-terminal sequence of 338 nts was reported to 
be required for DI RNA replication (Dalton et al., 2001), supporting 
the idea that, across all coronavirus genera, the 3’ UTR contains all 
the cis-acting elements required for replication. For MHV, it was 
shown that a significantly smaller fragment of no more than 55 
nts suffices for the initiation of negative-strand RNA synthesis (Lin 
et al., 1994), suggesting differential requirements for plus- versus 
minus-strand RNA synthesis. Furthermore, using betacoronavirus 
DI RNA systems, a short poly(A) tract of at least 5-10 nts was found 
to be required for replication (Spagnolo and Hogue, 2000). 


2.5. Structural and functional features of coronavirus 3’ 
cis-acting elements 


Cis-acting RNA elements present in the 3’-terminal genome 
regions have been studied most extensively in betacoronaviruses. 
The first essential RNA structural element, called bulged stem- 
loop (BSL), was discovered in MHV (Hsue and Masters, 1997). It 
comprises 68 nts immediately downstream of the MHV N gene 
stop codon and was discovered as a result of attempts to replace 
the MHV 3’ UTR with that of BCoV (Hsue and Masters, 1997). 
Despite limited sequence identity in the 3’ UTRs of the two viruses, 
replacements of the entire 3’ UTR (and specific parts of it) were 
tolerated, suggesting the presence of conserved structures (rather 
than sequences) that perform cis-acting functions in betacoron- 
avirus replication (Hsue and Masters, 1997). The conservation of 
functionally equivalent elements in the 3’ (and 5’) UTR(s) among 
betacoronavirus genomes was further supported by a study show- 
ing that a BCoV-derived reporter DI RNA was efficiently replicated 
by arange of BCoV- and MHV-related betacoronaviruses (Wu et al., 
2003). Interestingly, a possible BSL equivalent was also identified in 
IBV and other gammacoronaviruses and its functional significance 
was demonstrated using IBV DI RNA constructs (Dalton et al., 2001). 
The nearly perfect stem-loop structure in IBV comprises 42 nts and 
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Fig. 2. Alignment-based secondary structure prediction of 5’ genome regions of alphacoronaviruses. The viruses included in this analysis represent all currently recognized 
species in the genus Alphacoronavirus. The alignment was calculated by LocARNA, the structure by RNAalifold. The consensus sequence is represented using the IUPAC code. 
Colors are used to indicate conserved base pairs: from red (conservation of only one base-pair type) to purple (all six base-pair types are found); from dark (all sequences 
contain this base pair) to light colors (1 or 2 sequences are unable to form this base pair). The gray bars below the alignment indicate the extent of sequence conservation at 
a given position. Gray shadows are used to link RNA structures with the corresponding dot-bracket notations above the alignment. To refine the alignment, an anchor at the 


highly conserved core TRS-L was used. 


is located at the upstream end of region II, a conserved region in 
the gammacoronavirus 3’ UTR. 

The second essential RNA element within the (betacoronavirus) 
3’ UTR is a classical hairpin-type RNA pseudoknot (PK) structure 
that was first discovered in BCoV and shown to be required for DI 
RNA replication (Williams et al., 1999). In BCoV, the PK comprises 
54 nts and is located immediately downstream of the BSL. Equiv- 
alent PK structures were predicted to be conserved in beta- and 


alphacoronaviruses while gammacoronaviruses were proposed to 
retain only a few features of this PK or to lack this structure 
altogether (Williams et al., 1999). In a subsequent study using a 
reverse genetics approach, the functional significance of the PK 
in genome replication was demonstrated for MHV (Goebel et al., 
2004a). Also the more distantly related betacoronaviruses HCoV- 
HKU1 (Woo et al., 2005) and SARS-CoV were confirmed to contain 
this PK structure (Goebel et al., 2004b). The 3’ UTR regions of 
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Fig. 3. RNA secondary structure of the HCoV-229E 5’ UTR. (A) The RNA secondary structure of the 5’ UTR+ 20 nts was predicted using RNAfold --noLP. (B) RNA secondary 
structure of the 5’ UTR+ 20 nts was predicted using RNAfold --noLP -C. Structure probing data were used as constraints. The TRS-L core sequence and translational start 


codons are indicated. 


MHV and SARS-CoV, which only share 38% sequence identity, were 
shown to be interchangeable, again supporting the conservation 
of functionally equivalent structures among different betacoro- 
navirus lineages. Apparently, this conservation does not extend 
to alpha- and gammacoronaviruses because replacements of the 
MHV 3’ UTR with that of TGEV and IBV, respectively, did not give 
rise to viable MHV mutants (Goebel et al., 2004b). Together, these 
data suggest that coronaviruses evolved several genus-specific cis- 
acting RNA elements. For example, the presence of a BSL followed 
by a PK structure is limited to betacoronaviruses, while other gen- 
era appear to contain only one of these elements, with the PK 
being conserved in alphacoronaviruses and the BSL in gammacoro- 
naviruses (Dalton et al., 2001; Hsue and Masters, 1997; Williams 
et al., 1999) (see Section 2.6). 

The structures and functionally important substructures of both 
the BSL and PK have been characterized in significant detail for 
BCoV and MHV (Goebel et al., 2004a; Hsue et al., 2000; Williams 
et al., 1999). In the primary structure, the BSL and PK regions over- 
lap by several nucleotides. Formation of the first stem of the PK 
structure requires base-pairing interactions with the downstream 
segment F of the BSL, thereby destabilizing the latter structure. In 


an extensive MHV mutagenesis study, the functional significance 
of both structures was demonstrated conclusively. Because the two 
structures cannot exist simultaneously and, yet, each of them is 
essential for viral replication, it was proposed that the two elements 
may adopt alternate structures that act as a ‘molecular switch’ 
controlling the transition between different steps of the viral repli- 
cation cycle (Goebel et al., 2004a). Initial mechanistic insight into 
how this ‘molecular switch’ might work was obtained in a subse- 
quent study that provided evidence for a direct interaction between 
loop 1 of the PK with the extreme 3’ end of the MHV genome (Ziist 
et al., 2008). The characterization of second-site revertants aris- 
ing from MHV mutants with genetically engineered oligonucleotide 
insertions in loop 1 revealed distinct replacements at the extreme 
3’ end, thereby retaining specific base-pairing interactions with the 
loop 1 region and thus precluding the formation of stem 1 of the PK. 
Another set of mutants contained second-site mutations that sug- 
gested specific interactions of the PK region with nsp8 and nsp9. 
Based on this study, a model was proposed in which the formation 
and disruption of the PK by differential base-pairing interactions 
with the BSL and 3’-terminal genome sequences, respectively, 
may lead to alternate structures that govern different steps of the 
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Fig. 4. RNA secondary structure of the HCoV-NL63 5’ UTR. (A) The RNA secondary structure of the 5’ UTR+ 20 nts was predicted using RNAfold --noLP. (B) RNA secondary 
structure of the 5’ UTR+20 nts was predicted using RNAfold --noLP -C. Structure probing data were used as constraints. The TRS-L core sequence and translational start 


codons are indicated. 


R. Madhugiri et al. / Virus Research 194 (2014) 76-89 83 


initiation and continuation of negative-strand RNA synthesis (Ziist 
et al., 2008). Further evidence to support this model was obtained 
in a subsequent MHV reverse genetics study by Liu et al. (Liu et al., 
2013). Thermodynamic investigations revealed a limited stability 
of the PK structure (Stammler et al., 2011), further supporting the 
structural flexibility of this cis-acting element and, thus, its pro- 
posed role as a ‘molecular switch’. 

The region downstream of the PK is less well conserved among 
betacoronaviruses. It is generally referred to as the “hypervariable 
region (HVR)” and is not identical to the HVR identified at the 5’ 
end of the 3’ UTR in IBV (Dalton et al., 2001; Williams et al., 1993). 
The betacoronavirus HVR was predicted to contain a complex RNA 
structure whose existence and functional significance was sup- 
ported by enzymatic probing and MHV DI RNA mutagenesis studies 
(Liu et al., 2001). By contrast, more recent studies showed that large 
parts or even the entire HVR region can be deleted without causing 
major defects in MHV replication, arguing against an important role 
of this genome region in viral replication (Goebel et al., 2007; Ziist 
etal., 2008). Interestingly however, MHV HVR mutants proved to be 
highly attenuated in vivo, suggesting a possible role in pathogenesis 
(Goebel et al., 2007). 

About 70-80 nts from the 3’ end of the coronavirus genome, 
there is a conserved octanucleotide sequence, 5’-GGAAGAGC-3’, 
which was identified in early coronavirus sequence analyses per- 
formed in the late 1980s (Boursnell et al., 1985; Lapps et al., 1987; 
Schreiber et al., 1989) and subsequently found to be universally 
conserved across all coronavirus genera, with only very few viruses 
containing single-site replacements in this sequence (Goebel et al., 
2007). This strict conservation suggests an important functional 
role for the octanucleotide sequence. To date, however, the function 
of the sequence has not been identified. As mentioned above, the 
entire HVR including the octanucleotide sequence can be deleted 
from the MHV genome without causing major defects in viral repli- 
cation in vitro (Goebel et al., 2007). In line with this, replacements 
of single nucleotides within the octanucleotide motif were toler- 
ated although, in most cases, the octanucleotide mutants exhibited 
small-plaque phenotypes and/or delayed single-step growth kinet- 
ics. In both high- and low-multiplicity-of-infection experiments, 
octanucleotide and HVR deletion mutants lagged behind the wild- 
type virus but reached near-wildtype titers at later time points and 
had no detectable defect in gRNA or sgRNA synthesis (Goebel et al., 
2007). 

MHV and BCoV DI RNA studies provided evidence that the 3’ 
poly(A) tail present at the 3’ end of coronavirus genomes is another 
essential component of the coronavirus 3’ cis-acting signals, with 
a minimum of 5 to 10 adenylate residues being required for DI 
RNA replication (Spagnolo and Hogue, 2000). This requirement cor- 
responds well to the minimal binding site of the poly(A)-binding 
protein (PABP) on DI RNAs poly(A) sequences (Spagnolo and Hogue, 
2000). Recent studies further suggest that, in the course of BCoV 
infection, 3’ poly(A) tail lengths vary between 30 and 65 nts (Wu 
et al., 2013). This poly(A) tail length variation was confirmed to 
occur in beta- and gammacoronavirus infections and in a range of 
cell types, both in vitro and in vivo (Shien et al., 2014). The biological 
significance of these observations remains to be determined. 


2.6. Identification of 3’ terminal RNA structural elements of 
HCoV-229E and HCoV-NL63 


To identify putative cis-acting elements in the alphacoronavirus 
3’ UTR, we used a range of RNA folding algorithms to identify RNA 
structural elements in the HCoV-229E and HCoV-NL63 3’-terminal 
genome regions encompassing the last 20 nts from the N gene and 
the entire 3’-UTR. Because of the length of these sequences (300 
nts), many local secondary structures with similar free energies 
and base pair probabilities were identified and, thus, it proved to 


be difficult to make reliable predictions on stable RNA secondary 
structures in this region. As described above for the 5’ UTR, we 
therefore decided to use a combination of sequence and structural 
alignments of all currently recognized alphacoronavirus species to 
identify conserved RNA structures. The predictions were then fur- 
ther refined using structure probing data obtained for HCoV-229E 
and HCoV-NL63. 

The validity of the approach was first tested by analyzing con- 
served RNA structural elements in the betacoronavirus 3’ UTR for 
which a large body of information has been obtained in previous 
structural and functional studies (see above). As shown in Fig. 5, 
we were able to detect conserved RNA structural elements in the 
betacoronavirus 3’ UTR, including the BSL and the two SL structures 
that form the PK immediately downstream of the BSL. Consistent 
with previous studies (Goebel et al., 2004a), our predictions suggest 
that the formation of the PK requires structural rearrangements at 
the base of the BSL to permit the base-pairing interactions required 
to form PK stem 1, the latter involving the PK-SL2 loop sequence 
and the BSL 3’-terminal sequence (Fig. 5A and B). Interestingly, 
our analyses also revealed another conserved structural element, 
a short hairpin, immediately upstream of the PK-SL2. Formation of 
this hairpin would compete with base-pairing interactions required 
to form the basal part of the BSL and the PK stem 1, respectively 
(Fig. 5B). Furthermore, the hairpin overlaps partly with the PK 
loop 1 region that, in a previous study, was suggested to inter- 
act with the extreme 3’ end of the genome (Ziist et al., 2008). The 
conservation of both structure and sequence of this hairpin sug- 
gest a biological function for this element. In this context, it may 
be worth mentioning that the hairpin structure is predicted to be 
disrupted by the 6-nt insertion in loop 1 that was reported previ- 
ously to cause a poorly replicating and unstable phenotype in MHV 
(Goebel et al., 2004a). It remains to be seen if the small hairpin 
represents yet another element in the intricate network of base- 
pairing interactions between the BSL, the PK, and the 3’ end that 
together constitute the complex molecular switch proposed by the 
Masters laboratory (Goebel et al., 2004a). Consistent with previous 
studies, we identified only one conserved RNA structural element 
in the HVR downstream of PK-SL2. This stem-loop contained the 
conserved octanucleotide sequence in a single-stranded region. As 
pointed out above, the role of this element is currently unclear as 
both the HVR and octanucleotide sequence proved to dispensable 
for betacoronavirus (MHV) replication in vitro (Goebel et al., 2007; 
Liu et al., 2001). Overall, the alignment-based structure prediction 
algorithms used in our analysis led to conclusions that were consis- 
tent with results obtained in previous studies on betacoronavirus 3’ 
UTRs, suggesting that the approach might also be suitable to make 
reliable predictions on conserved RNA structural elements in the 
highly variable alphacoronavirus 3’ UTRs. 

Our alignment-based secondary structure predictions using 
representative viruses from all currently recognized alphacoron- 
avirus species revealed the conservation of several RNA structural 
elements in the alphacoronavirus 3’ UTR. A counterpart of the beta- 
coronavirus BSL structure (Goebel et al., 2004a; Hsue and Masters, 
1997) could not be identified in the alphacoronavirus 3’ UTR while 
structural elements suitable to form a PK structure could be iden- 
tified in all alphacoronaviruses (Fig. 6A). Interestingly, despite the 
absence of an upstream BSL in alphacoronaviruses, the formation 
of this putative PK structure is predicted to require the disruption 
of a short hairpin immediately upstream of PK-SL2, a scenario that 
is similar but less complex compared to the situation described 
above for betacoronaviruses. It remains to be investigated in further 
studies whether or not alphacoronaviruses employ a molecular 
switch mechanism similar to what has been described for beta- 
coronaviruses (Goebel et al., 2004a). 

The predicted SL2 structure (Fig. 6) could be confirmed by 
structure probing data obtained for HCoV-229E and HCoV-NL63 
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Fig. 5. Alignment-based secondary structure prediction of betacoronavirus 3’ genome regions. The viruses included in this analysis represent all currently recognized 
lineages and species in the genus Betacoronavirus. The alignment was generated using LocARNA and the structure was calculated using RNAalifold. The consensus sequence 
is represented using the IUPAC code. Colors are used to indicate conserved base pairs: from red (conservation of only one base-pair type) to purple (all six base-pair types are 
found); from dark (all sequences contain this base pair) to light colors (1 or 2 sequences are unable to form this base pair). Gray bars below the alignment indicate the extent 
of sequence conservation. Gray shadows are used to link RNA structures with the corresponding dot-bracket notations above the alignment. (A) Alignment-based secondary 
structure prediction of the bulged stem-loop (BSL) in the 3’ UTR. (B) Alignment-based secondary structure prediction of the pseudoknot (PK) region in the 3’ UTR. Note that 
PK-SL1 is an alternate structure that requires base-pairing interactions between the loop region of PK-SL2 and the basal part of the BSL shown in (A). Formation of the BSL 
basal part and PK structure, respectively, are mutually exclusive (see text for details). (C) Alignment-based secondary structure prediction of the hypervariable region (HVR) 
in the 3’ UTR. To refine the alignment, an anchor at the highly conserved octanucleotide sequence was used. 


(Fig. 7, detailed structure probing data to be published elsewhere). 
Furthermore, our structure probing data indicated base-pairing 
interactions upstream of SL2 in HCoV-NL63, supporting the for- 
mation of the predicted small hairpin in this region (Fig. 7A and 
B). By contrast, no such interactions were seen in HCoV-229E. Also, 
the structure probing data did not support the formation of a sta- 
ble PK structure, possibly reflecting a similarly low thermodynamic 
stability as determined for the equivalent PK in betacoronaviruses 
(Stammler et al., 2011). Further studies including reverse genetics 
experiments are required to confirm the existence and biological 
significance of the predicted alphacoronavirus PK structure. 

With respect to the HVR downstream of PK-SL2, an extensive 
stem-loop structure was predicted in our analyses of alpha- 
coronavirus 3’ UTRs (Fig. 6B). The structure is supported by a 
large number of covariant base pairs and contains the conserved 


octanucleotide sequence in a single-stranded region. The large dis- 
tal part of the stem-loop structure was further corroborated by 
structure probing data (Fig. 7, details to be published elsewhere). 
In both HCoV-229E and HCoV-NL63, the octanucleotide sequence 
was found to be located in the loop region. Of note, the cell culture- 
adapted HCoV-NL63 isolate used for our structure probing analysis 
contained a short deletion apparently acquired upon serial pas- 
saging in cell culture, resulting in a significantly smaller loop but 
retaining the octanucleotide sequence (with one G-to-A replace- 
ment) in an identical position in the loop when compared to 
HCoV-229E (see Fig. 7A and B). This serendipitous deletion shows 
that the distal part of the extended stem-loop structure is not essen- 
tial for HCOV-NL63 replication in cell culture. The data also indicate 
that, despite the deletion, the octanucleotide sequence retains a 
position in a loop region of the stem-loop structure and tolerates 
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Fig. 6. Alignment-based secondary structure prediction of alphacoronavirus 3’-terminal genome regions. The viruses included in this analysis represent all currently recog- 
nized species in the genus Alphacoronavirus. The alignment was calculated by LocARNA, the structure by RNAalifold. The consensus sequence is represented using the IUPAC 
code. Colors are used to indicate conserved base pairs: from red (conservation of only one base-pair type) to purple (all six base-pair types are found); from dark (all sequences 
contain this base pair) to light colors (1 or 2 sequences are unable to form this base pair). Gray bars below the alignment indicate the extent of sequence conservation at a 
given position. Gray shadows are used to link RNA structures with the corresponding dot-bracket notations above the alignment. (A) Alignment-based secondary structure 
prediction of the pseudoknot (PK) region in the 3’ UTR. Formation of the stem of PK-SL1 requires base-pairing interactions with the loop region of SL2. Formation of the PK 
and the two SL structures shown above the alignment are mutually exclusive (see text for details). (B) Alignment-based secondary structure prediction of the hypervariable 
region (HVR) in the 3’ UTR. To refine the alignment, an anchor at the highly conserved octanucleotide sequence was used. 
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Fig. 7. RNA secondary structure predictions of 3’-terminal genome regions of HCoV-229E (A) and HCoV-NL63 (B). Predictions were generated using RNAfold --noLP -C. As 
constraints, structure probing data were used. Formation of the predicted pseudoknot (PK) requires base-pairing interactions between the loop region of SL2 and an upstream 
sequence (and, possibly, structural rearrangements), resulting in the formation of PK stem 1 (PK-S1) as indicated. Also shown is the octanucleotide sequence that is conserved 
across all genera of the Coronavirinae. 
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minimal changes, the latter being consistent with MHV reverse 
genetics data obtained for the HVR/octanucleotide region (Goebel 
et al., 2007). 


3. Conclusions 


Based on numerous studies on betacoronaviruses including 
structural, biochemical, and reverse genetic work (DI RNA and 
replication-competent virus), a picture of putative cis-acting ele- 
ments is beginning to emerge (reviewed in Masters, 2007; Sola 
et al., 2011b). Previous work also identified a growing number 
of cellular and viral proteins that bind to these structures and 
likely have important functions at different steps of genomic and/or 
subgenomic RNA synthesis, genome packaging, genome expres- 
sion or intracellular targeting of structures engaged in viral RNA 
synthesis (reviewed in Narayanan and Makino, 2007; Sola et al., 
2011b). 

Using a combination of bioinformatic and biochemical meth- 
ods, the present study confirms and extends this previous work. 
Our study suggests that RNA secondary structure elements may 
be more conserved than previously thought, both within indi- 
vidual coronavirus genera and across different genera as shown 
here for the genera Alphacoronavirus and Betacoronavirus. Although 
significantly more work is needed to further characterize the struc- 
tures identified in this and previous studies and understand their 
functional role, the available data suggest a cross-genus conser- 
vation of a number of RNA structural elements among alpha- and 
betacoronaviruses. The conservation pattern is consistent with the 
replicase gene-based classification of genera within the subfamily 
Coronavirinae (de Groot et al., 2012a). Conserved elements include 
stem-loops 1, 2, 4, and, possibly, 5 in the 5’-terminal genome region 
and a putative PK in the 3’ UTR. The data further suggest that, in 
both alpha- and betacoronaviruses, the formation of the PK may 
require structural rearrangements in other regions (upstream of the 
SL2) and it remains an attractive idea to suggest specific functions 
for these alternative structures whose mechanistic and functional 
details remain to be investigated (Goebel et al., 2004a; Ziist et al., 
2008). Finally, in line with previous observations (Brian and Baric, 
2005; Masters, 2007; Sola et al.,2011b), the study confirms a signif- 
icant degree of variation in the 3’-terminal region of the 3’ UTR, with 
the conserved octanucleotide sequence being consistently located 
in a single-stranded region of a stem-loop structure. Interestingly, 
specific lineages of beta-, gamma- and deltacoronaviruses (as well 
as other plus-strand RNA viruses) may contain another structural 
element, called s2m, in the 3’ UTR, thus further adding to the vari- 
ability of this genome region in coronaviruses (Jonassen et al., 1998; 
Robertson et al., 2005; Tengs et al., 2013). The conservation pattern 
of the various lineage-specific structural elements argues against a 
conserved function in viral RNA synthesis but rather suggests that 
the 3’ UTR may contain elements that are involved in specific virus- 
host interactions and/or pathogenesis as has been shown for the 
MHV HVR (Goebel et al., 2007). 
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